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(57) Abstract 

A method and apparatus for profiling a flow of event data packets. The method comprises the steps of: receiving data defining 
sub-periods which partition a base time period, creating a profile of recent behaviour for each sub-^)eriod, and allocating each event data 
paclcet to one of the sub-periods according to a time indication associated with the event data packet The method and apparatus may be 
used in anomaly detection within data streams and, in particular.. account fraud detection where the event data relates to account usage. 
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METHOD AND SYSTEM FOR FRAUD DETECTION IN TELECOMMUNICATIONS 
FIELD OF THE INVENTION 

The present invention relates to a method and apparatus for performing 
5 pattern recognition within event streams, and a system incorporating the 
same. 

BACKGROUND TO THE INVENTION 

in recent years there has been a rapid increase in the number of 
commercially operated telecommunications networks in general and in 

10 particular wireless telecommunication networks. Associated with this 
proliferation of networks is a rise in fraudulent use of such networks the 
fraud typically taking the form of gaining illicit access to the network, and 
then using the network in such a way that the fraudulent user hopes 
subsequently to avoid paying for the resources used. This may for 

15 example involve misuse of a third party's account on the network so that 
the perpetrated fraud becomes apparent only when the third party is 
charged for resources which he did not use. 

Since fraudulent use of a single account can cost a network operator a 
large sum of money within a short space of time it is important that the 
20 operator be able to identify and deal with the most costly fonns of fraud at 
the earliest possible time. 

One of the steps employed in, but not limited to use in, such fraud 
detection systems is pattem recognition from event streams. 

Pattem recognition for event streams can be achieved by building up 
25 profiles of the behaviour of an entity and perfomiing pattem recognition 
over these profiles. In order for an entity to be profiled in this way, the 
entity must be able to have events associated with it. Examples of entities 
are: a single subscriber in a telephone network, a user accessing a data 
network, a switch in a telephone network or a server in a data network. 
30 The events to be associated with the user must be able to be represented 
in an Event Data Packet (EDP). The profiles of entities behaviour are 
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compared with known patterns of unacceptable behaviour to determine if 
the system should alert the end user to the entities behaviour pattern. 

The flow of Event Data Packets 110 of infonnation through a profiling 
pattern recognition system is shown in Figure 1. The Recent profile 130 
5 represents the typical usage for the entity over a recent period of time, 
approximately the last week. The Historical profile 140 represents the 
typical use for the entity over a preceding and longer time period, for 
example approximately the last six weeks. The EDPs are all accumulated 
into Polls of information. A Poll 120 is a set of EDPs received over a 

10 particular time period (e.g. 4 hours). The Poll information is then used to 
update the values in the Recent profile, and the Recent profile is then 
used to update the values in the Historical profile. The solid arrow 
between the EDPs and the Poll indicates that the information in each Poll 
is directly created from the EDPs. The dotted arrow between the Poll and 

15 the Recent indicates that the Poll information is used only to update the 
Recent behaviour, as is true for the Recent to Historical. 

In an example where the EDPs are Call Detail Records (CDRs) and the 
profiles represent voice telephony usage is given the profiles may consist 
of number of calls made and the duration of national and international 
20 calls. Table 1 shows an example of Recent and Historical profiles for such 
an example. 



Period 


Calls 


National . 
Duration 
(sec) 


International 
Duration 
(sec) 


Recent 
Profile 


2.5 


300 


200 


Historic 
Profile 


2.0 


250 


200 



Table 1 : Voice telephony recent and historic profile example 



25 
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• calls 3, 

• national 500, 

• international 100. 

Then after polling and once all updates to Recent and Historic profiles 
have completed the Recent and Historic profiles may be as shown in 
Table 2. 

The new recent profile is derived from the previous recent profile plus a 
proportion of the difference between the new and old recent profiles 

The new historic profile is derived from the previous historic profile plus a 
proportion of the difference between the new and old historic profiles, but 
the proportions typically differ from that of the recent profile case in that a 
higher proportion of the old historic profile is taken. 



Period 


Calls 


National 
Duration 
(s) 


International 
Duration 
(s) 


Recent 
Profile 


2.75 


280 


175 


Historic 
Profile 


2.1 


255 


195 



Table 2: Voice telephony recent and historic profile example after 

update 

it can be seen that the Recent profile has moved towards the newly 
added Poll profile and the Historic toward the previous Recent profile. 
These profiles provide a view of the entity's behaviour and how it changes 
over time. The profiles of behaviour can then be used for pattern 
recognition to identify which entity's behaviour reflects patterns to which 
the user of system wishes to be alerted. 
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There are however the following limitations to the method described 
above: 

The Recent and Historic profiles are built up from a series of Poll profiles. 
In order for the Recent and Historic profiles to maintain their integrity all 
5 Poll profiles must cover the same amount of time, for example a 4 hour 
period. 

The period of time the Polls must all cover must not be too small, 
othenwise natural variations in behaviour will appear to be anomalous. A 
typical recommended minimum is two hours. 

10 These two limitations, taken in consideration, mean that this method 
cannot be used for real time data feeds. 

It is also Incumbent upon the user to ensure that the data given to the 
product is split into appropriately sized chunks. This can be a burden to 
the user if, for example, hardware downtime means it is necessary to feed 
15 a backlog of data into the system. 

The profiles generated only represent the active periods for the user, this 
means that a user who is active in only one two hour period a week could 
have a similar profile to a user who is active in twenty of the two hour 
periods in a week. 

20 The nature of the data in the profile — as an average of activity in all X 
minute periods where the user had actually been active - where X is the 
duration of the Poll, is not intuitive to many end users of the system. 

In order for pattem recognition to occur effectively, the known patterns 
have to be represented in the same time period that the systems polls 
25 over. This can increase training times for the account fraud detection 
system which analyses the Poll, Recent profile, and Historical profile 
Information in order to identify anomalies. 

OBJECT OF THE INVENTION 

The invention seeks to provide an improved method and apparatus for 
30 behavioural pattem recognition for event streams in general and for event 
streams in an account fraud detection systems in particular. 
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According to a first aspect of the present invention there is provided a 
method of profiling a flow of event data packets comprising the steps of: 
receiving data defining a plurality of sub-periods which partition a base 
time period; creating a profile of recent behaviour for each of said sub- 
periods; allocating each Event Data Packet to one of said sub-periods 
according to a time indication associated with said Event Data Packet. 

The method may also comprise the steps of: creating a profile of historical 
behaviour for each of said sub-periods; at the end of said Base Time 
Period updating each of said Historical profiles responsive to the previous 
value of said Historical profile and a corresponding Recent profile, and 
resetting each said Recent profile. 

The method may also comprise the steps of: calculating an Event density 
for at least one of said Recent profiles. 

In a preferred embodiment, the said step of calculating an Event, density 
comprises the steps of: identifying a current time; identifying a Recent 
profile within which said current time falls; dividing a number of events 
recorded in said Recent profile by a time duration determined by a 
difference between said current time and a start time of sub-period 
associated with said Recent profile. 

Said Event Data may correspond to time intervals of differing length. 

The method may be used to capture a representation of inactivity within 
said flow. 

The method may also be used to permit trend analysis for an initial sub- 
period during said sub-period. 

According to a further aspect of the present invention there is provided a 
method of performing anomaly detection on a stream of Event Data 
Packets and comprising the steps of: receiving data defining a plurality of 
sub-periods which partition a base time period; creating a Recent profile 
for each of said sub-periods; allocating each Event Data Packet to a sub- 
period according a time indication in said Event Data Packet. 
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According to a further aspect of the present invention there is provided a 
method of account fraud detection comprising the steps of: receiving data 
defining a plurality of sub-periods which partition a base time period; 
creating a Recent profile for each of said sub-periods; receiving a series 
5 of Event Data Packets relating to account use; allocating each Event 
Data Packet to a sub-period according a time indication in said Event 
Data Packet. 

In a preferred embodiment account use relates to telecommunications 
network use. 

10 In a preferred embodiment said Event Data Packets are call detail 
records. 

According to a further aspect of the present invention there is provided a 
method of network intrusion detection comprising the steps of: receiving 
data defining a plurality of sub-periods which partition a base time period; 
15 creating a Recent profile for each of said sub-periods; receiving a series 

of Event Data Packets relating to account use; allocating each said Event 
Data Packet to a sub-period according to a time indication in said Event 
Data Packet. 

In a preferred embodiment said Event Data Packets relate to network 
20 audit log data. 

In a preferred embodiment said Event Data Packets relate to IP packet 
data. 

According to a further aspect of the present invention there is provided a 
system for profiling a flow of event data packets comprising: apparatus 
25 arranged to receive and store data defining a plurality of sub-periods 
which partition a base time period; apparatus arranged to create and 
store a Recent profile for each of said sub-periods; allocating each Event 
Data Packet to one of said sub-periods according to a time indication 
associated with said Event Data Packet. 

30 The system may be arranged to receive a plurality of flows and to perform 
processing on each flow independently of each other. 

According to a further aspect of the present invention there is provided a 
system for performing anomaly detection on a stream of Event Data 
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Packets and comprising: apparatus arranged to receive and store data 
defining a plurality of sub-periods which partition a base time period; 
apparatus arranged to create a profile of recent behaviour for each of said 
sub-periods; apparatus arranged to allocate each Event Data Packet to a 
sub-period according a time indication in said Event Data Packet. 

According to a further aspect of the present invention there is provided a 
system for account fraud detection comprising: apparatus arranged to 
receive and store data defining a plurality of sub-periods which partition a 
base time period; apparatus arranged to create a profile of recent 
behaviour for each of said sub-periods; apparatus arranged to allocate 
each Event Data Packet to a sub-period according a time indication in 
said Event Data Packet. 

According to a further aspect of the present invention there is provided a 
system for networi< intrusion detection comprising: apparatus arranged to 
receive and store data defining a plurality of sub-periods which partition a 
base time period; apparatus arranged to create a profile of recent 
behaviour for each of said sub-periods; apparatus arranged to allocate 
each Event Data Packet to a sub-period according a time indication in 
said Event Data Packet. 

The invention also provides for a system for the purposes profiling a flow 
of event data packets which comprises one or more instances of 
apparatus embodying the present invention, together with other additional 
apparatus. 

According to a further aspect of the present invention there is provided 
software on a machine readable medium arranged for profiling a flow of 
event data packets by: receiving data defining a plurality of sub-periods 
which partition a base time period; creating a Recent profile for each of 
said sub-periods; allocating each Event Data Packet to one of said sub- 
periods according to a time indication associated with said Event Data 
Packet. 

The preferred features may be combined as appropriate, as would be 
apparent to a skilled person, and may be combined with any of the 
aspects of the invention. 
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In order to show how the invention may be carried into effect, 
embodiments of the invention are now described below by way of 
example only and with reference to the accompanying figures in which: 

5 Figure 1 shows a block diagram of information flow in a behavioural 
pattem recognition system in accordance with the prior art; 

Figure 2 shows a block diagram of information flow in a behavioural 
pattem recognition system in accordance with the present invention. 

DETAILED DESCRIPTION OF INVENTION 

10 The method proposed here is illustrated in Figure 2. The EDPs 210 (in 
this example taking the form of Call Detail Records (CDRs)) again feed 
into a Poll 220 of information and the Poll information is used to update 
the values in the Recent profiles 230a-f in this case each entity has 
associated with it multiple Recent Profiles (six are shown but more or 

15 fewer may be used), where each Recent profile represents a period of 
time within a week (though a larger or shorter base period could be used), 
for example Saturday and Sunday between midnight and 8am. The 
Recent Profiles together cover the whole of a week period. Each Recent 
Profile has a related Historic Profile 240a-f which covers the same time 

20 period. 

Recent Profiles are filled until they contain all the data for the time period 
they cover. Once filled the values are used to update the corresponding 
Historic profile, and then the Recent profile values are reset to zero, and 
filled with the next CDRs in the time covered by the profile. 

25 For example, a customer of voice telephony may have the Recent profiles 
of behaviour illustrated in Table 3 and corresponding Historic profiles 
illustrated in Table 4. 



Profile 


Period 


Calls 


National 


International 


Number 






Duration 


Duration 








(s) 


(s) 
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1 


Weekdays, 
0:00 - 08:00 


1 


25 


0 


2 


Weekdays, 
08:00-18:00 


10 


500 


400 


3 


Weekdays, 
18:00-24:00 


0 


0 


0 


4 


Weekends, 
0:00 - 08:00 


0 


0 


0 


5 


Weekends, 
08:00-18:00 


5 


255 


15 


6 


Weekends, 
18:00-24:00 


0 


0 


0 



Table 3: Voice telephony recent profiles example 



Profile 
Number 


Period 


Calls 


National 
Duration 
(s) 


International 
Duration 
(s) 


1 


Weekdays, 
0:00 - 08:00 


1.5 


30 


2 


2 


Weekdays, 
08:00-18:00 


8.5 


800 


250 


3 


Weekdays, 
18:00-24:00 


2 


25 


15 


4 


Weekends, 
0:00 - 08:00 


0 


0 


0 


5 


Weekends, 
08:00- 18:00 


2 


25 


19 


6 


Weekends, 


0 


0 


0 
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18:00-24:00 









Table 4: Voice telephony historic profiles example 



A collection of Event Data (CDRs) is then presented to the system. The 
CDRs cover 7am on a Monday through to 1 pm on the same Monday. The 
previous collection of data presented to the system had contained a CDR 
5 for Sam on the same Monday. 

The CDR at 7am is added to Recent Profile 1. When this profile is 
'complete' the historic profile is updated. When the next time period is 
entered its recent profile values are reset to zero and new values 
accumulated. 

10 The Recent and Historical profiles after the data has been processed 
areas illustrated in Tables 5 and 6 respectively. 



Profile 
Number 


Period 


Calls 


National 
Duration 
(s) 


International 
Duration 
(s) 


1 


Weekdays, 
0:00 - 08:00 


2 


355 


0 


2 


Weekdays, 
08:00-18:00 


4 


300 


425 


3 


Weekdays, 
18:00-24:00 


0 


0 


0 


4 


Weekends, 
0:00 - 08:00 


0 


0 


0 


5 


Weekends, 
08:00-18:00 


5 


255 


15 


6 


Weekends, 
18:00-24:00 


0 


0 


0 



Table 5: Voice telephony recent profiles after processing 
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Profile 
Number 


Period 


Calls 


National 
Duration 
(s) 


International 
Duration 
(s) 


1 


Weekdays, 
0:00 — 08:00 


2.0 


62.5 


1.8 


2 


Weekdays, 
08:00 - 1 8:00 


8.05 


750 


267.5 


3 


Weekdays, 

1 o.UU — ^4.UU 


2 


25 


15 


4 


Weekends, 
0:00 - 08:00 


0 


0 


0 


5 


Weekends, 
08:00-18:00 


2 


25 


19 


6 


Weekends, 
18:00-24:00 


0 


0 


0 



Table 6: Voice telephony historic profiles after processing 



The only Recent profiles changed are those that cover the same time 
5 period as the CDRs in the poll namely periods 1 and 2. The only Historic 
profile changed is in period 1 , the values in the Recent profile having been 
used to update the Historic profile. After updating the Historic profile, the 
Recent profile is then reset to zero before new CDR information is added 
to it. 

10 Historic profiles are only updated once the Recent profile has been filled 
with all the information for that time period. This means that the size of the 
Poll has no influence over the Historic profiles, and the Recent profiles 
can contain details for any sub-period of the time period they cover, or the 
whole time period. 
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The profiles of behaviour are converted into Event Densities before 
pattern recognition is performed on them. Event Densities are produced 
by dividing the event data value by the number of seconds in the period 
during which those events occurred. For example, Table 6 shows an 
5 example set of Historic profile values and the corresponding event 
densities values where the period covered 14400 seconds (4 hours). 



Period 


Calls 


National 
Duration 


international 
Duration 


Historic 
Profile 
Values 


10 


200 s 


300 s 


Event 
Densities 


10/14400 
(= 0.00069) 


200/14400 
(= 0.00139) 


300/14400 
(= 0.02083) 



Table 7: Voice telephony historic profiles after processing 

Event densities for historic profiles provide an average of behaviour over 
10 the whole time period. This means that dividing by the number of seconds 
in the time period gives the normal amount of behaviour in any one 
second. These are generally small values. 

Recent profiles however may or may not contain values for the whole the 
time period they cover. Frecfuently the Recent profile that is being 

15 analysed is not yet complete. For example, if ten minutes of event data 
require analysing for the time period 9.15am to 9.25am then a recent 
profile that covers the time period Sam to 6pm will be updated, but the 
time period for this profile is not yet complete. As the period is incomplete 
the number of seconds to divide by is calculated as follows. The complete 

20 time period is divided into blocks of time, for example 30 minutes. A usage 
period consists of x of these blocks of time. The event data in the current 
incomplete Recent profile is divided by the number of seconds in the 
blocks covered so far. So event data covering up to 9.25 am has covered 
three 30 minute blocks so far and the values are divided by 5400 seconds 

25 (90 minutes). Conversion into densities enables pattem recognition to be 
perfonned over event data that covers just a portion of the total time 
period. 
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• the polls of event data can be of any size whilst still allowing the 
profiles produced by the system to maintain their integrity; 

• polls of data for very small time periods can be handled easily; 

5 • the preceding two advantages have the consequence that the 

system is suitable for both real time feeds and bulk batch feeds of 
poll data; 

• there is consequently no burden on the end user to divide up the 
event data into fixed sized chunks; and 

10 • the profiles represent accurately the behaviour of the user, 

including a representative of inactivity by the user, and a 
representation of the time of use. 

This method may be used in several application areas. These include 
telephony fraud detection using call detail records (CDRs), anomaly 
15 detection on data streams, network intrusion detection using audit log 
data or IP packet data. The method also provides a means of comparison 
between recent behaviour and past behaviour for event streams that has 
potentially wide application for the rapid detection of behavioural changes. 

Any range or device value given herein may be extended or altered 
20 without losing the effect sought, as will be apparent to the skilled person 
for an understanding of the teachings herein. 
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CLAfMS 

1 . A method of profiling a flow of event data packets comprising 

the steps of: 

receiving data defining a plurality of sub-periods which partition 
5 a base time period; 

creating a profile of recent behaviour for each of said sub- 
periods; 

allocating each Event Data Packet received to one of said sub- 
periods according to a time indication associated with said Event Data 
10 Packet. 

2. A method according to claim 1 comprising the steps of: 

creating a profile of historical behaviour for each of said sub- 
periods; 

at the end of said Base Time Period updating each of said 
15 Historical profiles responsive to the previous value of said Historical profile 
and a corresponding Recent profile, and resetting each said Recent 
profile. 

3. A method according to any one of claims 1-2 additionally 
comprising the step of: 

20 calculating an Event density for at least one of said Recent 

profiles. 

4. A method according to claim 3 wherein said step of calculating 
an Event density comprises the steps of: 

identifying a current time; 

25 identifying a Recent profile within which said current time falls; 

dividing a number of events recorded in said Recent profile by a 
time duration determined by a difference between said current time and a 
start time of sub-period associated with said Recent profile. 
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5. A method according to any one of claims 1-4, wherein said 

Event Data may correspond to time intervals of differing length. 



6. A method according to any one of claims 1-5, whereby to 
capture a representation of inactivity within said flow. 

7. A method according to any one of claims 1-6, whereby to 
permit trend analysis for an initial sub-period during said sub-period. 

8. A method of performing anomaly detection on a stream of Event 
Data Packets and comprising the steps of: 

receiving data defining a plurality of sub-periods which partition 
a base time period; 

creating a Recent profile for each of said sub-periods; 

allocating each Event Data Packet to a sub-period according to 
a time indication in said Event Data Packet. 

9. A method of account fraud detection comprising the steps of: 

receiving data defining a plurality of sub-periods which partition 
a base time period; 

creating a Recent profile for each of said sub-periods; 

receiving a series of Event Data Packets relating to account 

use; 

allocating each Event Data Packet to a sub-period according to 
a time indication in said Event Data Packet. 

10. A method of account fraud detection according to claim 9, 
wherein said account use relates to telecommunications network use. 

11. A method of account fraud detection according to any one of 
claims 9-10, wherein said Event Data Packets are call detail records. 



12. 
of: 



A method of network intrusion detection comprising the steps 
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receiving data defining a plurality of sub-periods which partition 
a base time period; 

creating a Recent profile for each of said sub-periods; 

receiving a series of Event Data Packets relating to account 

use; 

allocating each said Event Data Packet to a sub-period 
according to a time Indication in said Event Data Packet. 

13. A method of network intrusion detection according to claim 12, 
wherein said Event Data Packets relate to network audit log data. 

14. A method of network intrusion detection according to claim 12, 
wherein said Event Data Packets relate to IP packet data. 

15. A system for profiling a flow of event data packet polls 
comprising: 

apparatus arranged to receive and store data defining a plurality 
of sub-periods which partition a base time period; 

apparatus arranged to create and store a Recent profile for 
each of said sub-periods; 

allocating each Event Data Packet in said Poll to one of said 
sub-periods according to a time indication associated with said Event 
Data Packet. 

16. A system according to claim 15 arranged to receive a plurality of 
flows and to perform process each flow independently of each other. 

17. A system for performing anomaly detection on a stream of 
Event Data Packets and comprising: 

apparatus arranged to receive and store data defining a plurality 
of sub-periods which partition a base time period; 

apparatus arranged to create a Recent profile for each of said 
sub-periods; 
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apparatus arranged to allocate each Event Data Packet to a 
sub-period according to a time indication in said Event Data Packet. 



apparatus arranged to receive and store data defining a plurality 
of sub-periods which partition a base time period; 

apparatus arranged to create a profile of recent behaviour for 
each of said sub-periods; 

apparatus arranged to allocate each Event Data Packet to a 
sub-period according a time indication in said Event Data Packet. 

19. A system for of network intrusion detection comprising: 

apparatus arranged to receive and store data defining a plurality 
of sub-periods which partition a base time period; 

apparatus arranged to create a profile of recent behaviour for 
each of said sub-periods; 

apparatus arranged to allocate each Event Data Packet to a 
sub-period according a time indication in said Event Data Packet. 

20. Software on a machine readable medium arranged for profiling 

a flow of event data packet polls by: 

receiving data defining a plurality of sub-periods which partition 
a base time period; 

creating a profile of recent behaviour for each of said sub- 
periods; 

allocating each Event Data Packet inset Poll to one of said sub- 
periods according to a time indication associated with said Event Data 
Packet. 



18. 



A system for account fraud detection comprising: 
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